{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "05859721",
   "metadata": {},
   "source": [
    "# Question Answering\n",
    "\n",
    "This notebook walks through how to use LangChain for question answering over a list of documents. It covers four different types of chains: `stuff`, `map_reduce`, `refine`, `map_rerank`. And You can find the origin notebook in [LangChain example](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb), and this example will show you how to set the LLM with GPTCache so that you can cache the data with LLM. You can also try this example on [Google Colab](https://colab.research.google.com/drive/1C3VBKn3L38opEyaOH-SlGQrANzHkR_5f?usp=sharing)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea19ae7e",
   "metadata": {},
   "source": [
    "## Go into GPTCache\n",
    "\n",
    "Please [install gptcache](https://gptcache.readthedocs.io/en/latest/index.html#) first, then we can initialize the cache.There are two ways to initialize the cache, the first is to use the map cache (exact match cache) and the second is to use the DataBse cache (similar search cache), it is more recommended to use the second one, but you have to install the related requirements.\n",
    "\n",
    "Before running the example, make sure the `OPENAI_API_KEY` environment variable is set by executing `echo $OPENAI_API_KEY`. If it is not already set, it can be set by using `export OPENAI_API_KEY=YOUR_API_KEY` on Unix/Linux/MacOS systems or `set OPENAI_API_KEY=YOUR_API_KEY` on Windows systems. And there is `get_content_func` for the cache settings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "04759af1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the content(only question) form the prompt to cache\n",
    "def get_content_func(data, **_):\n",
    "    return data.get(\"prompt\").split(\"Question\")[-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0740dd59",
   "metadata": {},
   "source": [
    "### 1. Init for exact match cache"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3f19b149",
   "metadata": {},
   "outputs": [],
   "source": [
    "# from gptcache import cache\n",
    "# cache.init(pre_embedding_func=get_content_func)\n",
    "# cache.set_openai_key()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a06c2c25",
   "metadata": {},
   "source": [
    "### 2. Init for similar match cache\n",
    "\n",
    "When initializing gptcahe, the following four parameters are configured:\n",
    "\n",
    "- `pre_embedding_func`: pre-processing before extracting feature vectors, it will use the `get_file_name` method\n",
    "- `embedding_func`: the method to extract the text feature vector\n",
    "- `data_manager`: DataManager for cache management\n",
    "- `similarity_evaluation`: the evaluation method after the cache hit\n",
    "\n",
    "The `data_manager` is used to audio feature vector, response text in the example, it takes [Milvus](https://milvus.io/docs) (please make sure it is started), you can also configure other vector storage, refer to [VectorBase API](https://gptcache.readthedocs.io/en/latest/references/manager.html#module-gptcache.manager.vector_data)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "169b0d11",
   "metadata": {},
   "outputs": [],
   "source": [
    "from gptcache import cache\n",
    "from gptcache.embedding import Onnx\n",
    "from gptcache.manager import CacheBase, VectorBase, get_data_manager\n",
    "from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\n",
    "\n",
    "\n",
    "onnx = Onnx()\n",
    "cache_base = CacheBase('sqlite')\n",
    "vector_base = VectorBase('milvus', host='127.0.0.1', port='19530', dimension=onnx.dimension, collection_name='chatbot')\n",
    "data_manager = get_data_manager(cache_base, vector_base)\n",
    "cache.init(\n",
    "    pre_embedding_func=get_content_func,\n",
    "    embedding_func=onnx.to_embeddings,\n",
    "    data_manager=data_manager,\n",
    "    similarity_evaluation=SearchDistanceEvaluation(),\n",
    "    )\n",
    "cache.set_openai_key()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44538b8b",
   "metadata": {},
   "source": [
    "After initializing the cache, you can use the LangChain LLMs with `gptcache.adapter.langchain_models`. At this point **gptcache** will cache the answer, the only difference from the original example is to change `llm = OpenAI(temperature=0)` to `llm = LangChainLLMs(llm=OpenAI(temperature=0))`, which will be commented in the code block.\n",
    "\n",
    "Then you will find that it will be more fast when search the similar content, let's play with it."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "726f4996",
   "metadata": {},
   "source": [
    "## Prepare Data\n",
    "First we [prepare the data](https://raw.githubusercontent.com/hwchase17/langchain/master/docs/extras/modules/state_of_the_union.txt). For this example we do similarity search over a vector database, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents). You can learn more detail about Milvus in Langchain refer to [it](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/milvus.html?highlight=milvus)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "17fcbc0f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.vectorstores import Milvus\n",
    "from langchain.document_loaders import TextLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "ef9305cc",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "loader = TextLoader('./state_of_the_union.txt')\n",
    "documents = loader.load()\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "embeddings = OpenAIEmbeddings()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "99ebc875",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_db = Milvus.from_documents(\n",
    "    docs,\n",
    "    embeddings,\n",
    "    connection_args={\"host\": \"127.0.0.1\", \"port\": \"19530\"},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "d1eaf6e6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "query = \"What did the president say about Justice Breyer\"\n",
    "docs = vector_db.similarity_search(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a16e3453",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chains.question_answering import load_qa_chain\n",
    "from langchain.llms import OpenAI\n",
    "\n",
    "from gptcache.adapter.langchain_models import LangChainLLMs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f64b7f8",
   "metadata": {},
   "source": [
    "## Quickstart\n",
    "If you just want to get started as quickly as possible, this is the recommended way to do it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "fd9e6190",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' The president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service.'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# llm = OpenAI(temperature=0) # using the following code to cache with gptcache\n",
    "llm = LangChainLLMs(llm=OpenAI(temperature=0))\n",
    "chain = load_qa_chain(llm, chain_type=\"stuff\")\n",
    "query = \"What did the president say about Justice Breyer\"\n",
    "chain.run(input_documents=docs, question=query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eea01309",
   "metadata": {},
   "source": [
    "If you want more control and understanding over what is happening, please see the information below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f78787a0",
   "metadata": {},
   "source": [
    "## The `stuff` Chain\n",
    "\n",
    "This sections shows results of using the `stuff` Chain to do question answering."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "180fd4c1",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chain = load_qa_chain(llm, chain_type=\"stuff\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "77fdf1aa",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'output_text': ' The president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service.'}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"What did the president say about Justice Breyer\"\n",
    "chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ea50ad0",
   "metadata": {},
   "source": [
    "## The `refine` Chain\n",
    "\n",
    "This sections shows results of using the `refine` Chain to do question answering."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "fb167057",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chain = load_qa_chain(llm, chain_type=\"refine\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "d8b5286e",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'output_text': '\\nThe president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service. He also praised Justice Breyer for his dedication to the rule of law and his commitment to justice and fairness.'}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"What did the president say about Justice Breyer\"\n",
    "chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f95dfb2e",
   "metadata": {},
   "source": [
    "**Intermediate Steps**\n",
    "\n",
    "We can also return the intermediate steps for `refine` chains, should we want to inspect them. This is done with the `return_refine_steps` variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "a5c64200",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chain = load_qa_chain(llm, chain_type=\"refine\", return_refine_steps=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "817546ac",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'intermediate_steps': [' The president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service.',\n",
       "  ' The president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service.',\n",
       "  ' The president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service.',\n",
       "  '\\nThe president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service. He also praised Justice Breyer for his dedication to the rule of law and his commitment to justice and fairness.'],\n",
       " 'output_text': '\\nThe president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service. He also praised Justice Breyer for his dedication to the rule of law and his commitment to justice and fairness.'}"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0f0bbdf",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  },
  "vscode": {
   "interpreter": {
    "hash": "b1677b440931f40d89ef8be7bf03acb108ce003de0ac9b18e8d43753ea2e7103"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
